Using Negation and Phrases in Inducing Rules for Text Classification

نویسندگان

  • Stephanie Chua
  • Frans Coenen
  • Grant Malcolm
  • Matias Garcia-Constantino
چکیده

An investigation into the use of negation in Inductive Rule Learning (IRL) for text classification is described. The use of negated features in the IRL process has been shown to improve effectiveness of classification. However, although in the case of small datasets it is perfectly feasible to include the potential negation of all possible features as part of the feature space, this is not possible for datasets that include large numbers of features such as those used in text mining applications. Instead a process whereby features to be negated can be identified dynamically is required. Such a process is described in the paper and compared with established techniques (JRip, NaiveBayes, Sequential Minimal Optimization (SMO), OlexGreedy). The work is also directed at an approach to text classification based on a “bag of phrases” representation; the motivation here being that a phrase contains semantic information that is not present in single keyword. In addition, a given text corpus typically contains many more key-phrase features than keyword features, therefore, providing more potential features to be negated. Stephanie Chua Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, L69 3BX Liverpool, UK, e-mail: [email protected] Frans Coenen Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, L69 3BX Liverpool, UK, e-mail: [email protected] Grant Malcolm Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, L69 3BX Liverpool, UK, e-mail: [email protected] Matı́as Fernando Garcı́a Constantino Department of Computer Science, University of Liverpool, Ashton Building, Ashton Street, L69 3BX Liverpool, UK, e-mail: [email protected] Stephanie Chua, Frans Coenen, Grant Malcolm, Matı́as Fernando Garcı́a-Constantino

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

INDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

INDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

Negation detection in Swedish clinical text: An adaption of NegEx to Swedish

BACKGROUND Most methods for negation detection in clinical text have been developed for English text, and there is a need for evaluating the feasibility of adapting these methods to other languages. A Swedish adaption of the English rule-based negation detection system NegEx, which detects negations through the use of trigger phrases, was therefore evaluated. RESULTS The Swedish adaption of N...

متن کامل

Use of negation phrases in automatic sentiment classification of product reviews

This paper reports a study in automatic sentiment classification, i.e., automatically classifying documents as expressing positive or negative sentiments. The study investigates the effectiveness of using a machine-learning algorithm, support vector machine (SVM), on various text features to classify on-line product reviews into recommended (positive sentiment) and not recommended (negative sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011